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ABSTRACT 

Despite growing consensus that long intergenic non- 
coding ribonucleic acids (lincRNAs) are modulators 
of cancer, the knowledge about the deoxyribonucleic 
acid (DNA) methylation patterns of lincRNAs in can- 
cers remains limited. In this study, we constructed 
DNA methylation profiles for 4629 tumors and 705 
normal tissue samples from 20 different types of hu- 
man cancer by reannotating data of DNA methyla- 
tion arrays. We found that lincRNAs had different 
promoter methylation patterns in cancers. We clas- 
sified 2461 lincRNAs into two categories and three 
subcategories, according to their promoter methy- 
lation patterns in tumors. LincRNAs with resistant 
methylation patterns in tumors had conserved tran- 
scriptional regulation regions and were ubiquitously 
expressed across normal tissues. By integrating can- 
cer subtype data and patient clinical information, we 
identified lincRNAs with promoter methylation pat- 
terns that were associated with cancer status, sub- 
type or prognosis for several cancers. Network anal- 
ysis of aberrantly methylated lincRNAs in cancers 
showed that lincRNAs with aberrant methylation pat- 
terns might be involved in cancer development and 
progression. The methylated and demethylated lin- 
cRNAs identified in this study provide novel insights 
for developing cancer biomarkers and potential ther- 
apeutic targets. 

INTRODUCTION 

Deep sequencing with new computational approaches for 
assembling transcriptome has identified tens of thousands 
of large intergenic transcripts across different tissues and 
cell types. These intergenic transcripts do not code for pro- 
teins and are named long intergenic non-coding ribonu- 
cleic acids (lincRNAs) (1,2). Many lincRNAs are dysreg- 



ulated in human cancers and implicated in disease pro- 
gression through modulating apoptosis, increasing cellular 
oncogenic potential or inhibiting tumor growth (3,4). Al- 
though several lincRNAs [lincRNA_p21 (5), HOTAIR (6), 
PCA3 (7) among others] have been depicted with relatively 
explicit molecular mechanisms in several cancers, little is 
known about the regulatory mechanisms of lincRNAs in 
tumors or normal tissues, especially on regulation by de- 
oxyribonucleic acid (DNA) methylation. 

DNA methylation at gene promoters is crucial for gene 
silencing and involved in many diseases (8). DNA methy- 
lation of lincRNA promoters might be an epigenetic reg- 
ulator of lincRN As expression (9), for instance, lincRNA 
Gh2 {MEG3), whose expression was indirectly regulated 
by mir-29a in hepatocellular carcinoma cells, which in- 
hibited the activity of DNA methyltransferase and caused 
de-repression of MEG3 expression (10). Several lincRNAs 
were upregulated in the human colorectal cancer cell line 
HCT116 by treatment with a DNA-demethylating agent 
(11). However, systematically identifying cancer-related 
methylation patterns of human lincRNAs is still a chal- 
lenge, partly because of a lack of global DNA methylation 
profiles for lincRNAs. 

High-resolution next-generation sequencing and mi- 
croarray technologies have been used for genome-scale 
mapping of DNA methylation (12). Illumina Infinium Hu- 
manMethylation450 BeadChip Array (Infinium 450k) has 
485 577 probes that comprehensively cover most known 
CpG islands (CGIs) and 99% of NCBI Reference Sequence 
genes (13). The Cancer Genome Atlas (TCGA) Research 
Network contains a large number of data sets with Infinium 
450k arrays for thousands of tumor samples with corre- 
sponding normal samples and matched clinical annotations 
that are all pubHcly available (14-16). Previous studies suc- 
cessfully extracted lincRNA expression information by re- 
purposing the microarray data, which were originally de- 
signed to detect the expression of genes or exons (17-19). 
By reannotating the Infinium 450k array, we could obtain 
lincRNA methylation levels in a large number of samples. 
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We developed a computational strategy to reannotate 
the Infinium 450k array and observed that DNA methy- 
lation level of lincRNA promoters was tightly linked to 
lincRNA transcription. We constructed DNA methylation 
profiles for 20 distinct types of cancer according to lin- 
cRNA promoter methylation levels. We classified the lin- 
cRNAs into two categories and three subcategories and 
found that lincRNAs with resistant methylation patterns 
in tumors had conserved transcriptional regulation regions 
and were ubiquitously expressed across normal tissues. By 
analyzing the lincRNA methylation profiles together with 
clinical information for tumors in breast invasive cancer 
(BRCA), lung squamous cell cancer (LUSC) and uterine 
corpus endometrioid cancer (UCEC) among others, and 
subtype data of tumors in BRCA (20) and LUSC (21), we 
identified lincRNAs with promoter methylation patterns 
that were associated with cancer status, subtype or prog- 
nosis. These lincRNAs could be further evaluated for use 
as cancer biomarkers and potential cancer therapy targets. 
Some lincRN As with aberrant methylation patterns in can- 
cers might involve in cancer development and progression. 
Early detection of hypermethylated or hypomethylated lin- 
cRNAs could serve as cancer biomarkers for diagnosis or 
treatment. 

MATERIALS AND METHODS 

Data sources 

DNA methylation data from Infinium 450k arrays and pa- 
tients clinical data were downloaded from TCGA (https: 
//tcga-data.nci.nih.gov/tcga/). RNA-seq data for 16 tissues 
were derived from Human Body Map 2 project (SRA, E- 
MTAB-513)(1). RNA-seq data of six cell lines were from 
Gene Expression Omnibus (GEO, GSE23316) (22). Corre- 
sponding lincRNA expression data from patients were cal- 
culated based on the RNA-seq V2 data from TCGA. CGI 
annotation and repetitive element (RE) annotation data 
were from UCSC Genome Browser (23). Annotation files 
for lincRNAs and protein-coding genes (PCGs) were down- 
loaded from Human lincRNA Catalog (1). 

Re-annotating data from the Infiniuni 450k array to construct 
lincRNA methylation profiles 

We mapped 485 577 probe sequences (50 bp in length) to 
human genome (hgl9) with BLAT (24). We treated BLAT 
output in two steps: first, we retained the probes uniquely 
mapped to a single location in human genome with a max- 
imum of two mismatches. Second, we eliminated the se- 
quences with gaps. A total of 485 512 probe sequences from 
Infinium 450k arrays were uniquely mapped. We then as- 
signed the probe sequences into four lincRNA-associated 
regions according to the Human lincRNA Catalog annota- 
tion file (1): regions 10 kb upstream from the transcription 
start sites (TSSs), regions 10 kb downstream from the tran- 
scription termination sites (TTSs), exons and introns. We 
only retained the probe sequences exclusively mapped to a 
single region. 

To estimate the methylation level of a given probe, we 
used the beta value: the ratio of intensities between methy- 
lated and unmethylated alleles. The beta value and cor- 



responding P-value of each probe were obtained from 
the level 3 Infinium 450k data in TCGA. Beta value — 
4ieth/(4ieth+ /unmeth), whcrc /meih IS the intensity of methy- 
lation and /unmeth is the intensity of unmethylation. We only 
used the beta values with significant detection P-values {P 
< 0.05) in calculations to avoid using the missing data. 

For each type of cancer, we constructed lincRNA methy- 
lation profiles using the methylation levels of the probes 
mapped into 10 kb upstream from the TSSs of the lincR- 
NAs. In research on a cw-regulatory element annotation 
system, Liu et al. specified the largest promoter size as 10 
kb upstream from the TSS (25). Since the regulatory mech- 
anism of lincRNAs transcription is similar to the regulation 
of genes (26), we used 10 kb upstream from the TSS for a 
relatively comprehensive range of lincRNA promoters. We 
used only the probes closest to each TSS to determine the 
DNA methylation status of hncRNA promoters. 

Classification of lincRNAs with divergent methylation pat- 
terns in cancers 

We classified lincRNAs into two categories: prone to methy- 
lation (PM) lincRNAs and resistant to methylation (RM) 
lincRNAs. LincRNA promoters with beta value < 0.3 were 
considered as unmethylated promoters and those with beta 
value > 0.3 were considered as methylated ones. These cut- 
offs and strategies were similar to those in previous stud- 
ies (27,28). LincRNAs with methylated promoters in more 
than 20% of all tumor samples were defined as PM lincR- 
NAs. LincRNAs with unmethylated promoters in all tu- 
mors were named RM lincRNAs. PM lincRNAs that were 
methylated in more than 5% of tumors for each cancer 
were defined as consistently methylated (CM) lincRNAs. 
PM lincRNAs that were unmethylated in tumor samples for 
at least one cancer were classified as variable methylation 
(VM) lincRNAs. 

Analysis of REs at lincRNA promoters 

We obtained the position information of the REs from 
the Repeat Masker track (RMSK) in the UCSC Genome 
Browser (hgl9) (29). We divided the region ±10 kb around 
lincRNA TSS into 20 equal-sized bins. The REs were con- 
sidered to exist if they overlapped with the bins. We plotted 
the frequency of the REs in each bin for lincRNAs belong- 
ing to the three categories. We tested differences between the 
categories using Fisher's exact tests based on the density of 
the REs in an interval ± 2 kb around TSSs. 

Analysis of evolutionary conservation for lincRNA promoters 

We used the measurements of base substitutions in 46 pla- 
cental mammals (phastCons46way, UCSC) to analyze the 
evolutionary conservation for lincRNAs in different cate- 
gories. We separated the region upstream and downstream 
10 kb from the TSS of a lincRNA into 20 non-overlapping 
intervals, taking direction of transcription into account. We 
calculated the mean Phastcons scores for each interval. We 
tested the significance of differences between categories us- 
ing scores calculated for intervals ± 2 kb around TSSs with 
Wilcoxon rank sum tests. 
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Predicting the functions of lincRNAs 

We used two strategies to predict the functions of lincRNAs. 
According to lincRNA c«-regulatory mechanism (30), we 
used the PCGs adjacent to the target HncRNAs to infer the 
potential functions of lincRNAs with three rules; first, the 
PCGs must locate on the same strand with the target lin- 
cRNAs. Second, the PCGs were nearest to the target lincR- 
NAs. Third, only PCGs within 10 kb from the target lincR- 
NAs were retained. We also performed functional enrich- 
ment analysis of the PCGs co-expressed with the lincRNAs 
to predict the potential functions of lincRNAs in cancers. 

Obtaining lincRNA expression values from the RNA-seq V2 
data in TCGA 

We recalculated the RPKM values for lincRNAs using the 
RNA-seq V2 data in TCGA (31). RPKM = (raw read 
counts X 10^)/(total reads x length of lincRNA JC), where 
raw read counts — sum of raw read counts in all exons en- 
tirely mapped to the lincRNA locus; total reads — sum of 
raw read counts calculated for all transcripts of a sample; 
and length of lincRNA_X — sum of length of exons mapped 
to the lincRNAJC locus. To avoid ambiguous exons map- 
ping, we merged the overlapping lincRNA transcripts into 
a single candidate HncRNA. 

Identifying lincRNAs with prognosis- or cancer subtype- 
associated promoter methylation patterns 

We calculated Kaplan-Meier log-rank /"-values to identify 
lincRNAs with overall survival (OS)-associated methyla- 
tion patterns. Tumors were separated according to the me- 
dian methylation of each lincRNA. For each cancer, we di- 
vided tumors into a discovery set and a validation set. In the 
discovery phase, we retained lincRNAs with significant log- 
rank /"-value < 0.05. We permuted the labels for tumors in 
the discovery set 5000 times to calculate the background dis- 
tribution of log-rank /"-values for each lincRNA. We then 
estimated the false discovery rate (FDR) for each lincRNA 
using its own background (32). We validated only lincRNAs 
under a threshold of FDR — 0.01 in the vaHdation set. We 
identified lincRNAs with prognosis-associated (PA) methy- 
lation patterns in cancers with tumor sample size available 
for both clinical and methylation data > 200 and a censor- 
ing (alive sample) rate < 0.9; or the tumor sample size < 200 
and a censoring rate < 0.8 for an effective survival analysis 
(Supplementary Table SI) (33). We performed r-tests (one- 
tailed) to compare the lincRNA methylation pattern of pa- 
tients in each subtype to those in other subtypes in each can- 
cer. The lincRNAs that showed statistically higher or lower 
methylation (FDR < 0.05) in only one subtype were con- 
sidered as having subtype-specific methylation patterns. 

Statistical analyses 

Functional enrichments of PCGs were consisted on the 
Fisher's exact test (two-tailed) implemented by DAVID v6.7 
(http://david.abcc.ncifcrf gov/) (34). Aberrantly methylated 
(AM) lincRNAs between tumors and corresponding nor- 
mal samples were identified by ?-test (two-tailed), when 
FDR < 0.05 and laverage beta value of tumors — average 
beta value of normal samples! > 0.3. 



RESULTS 

A reannotation strategy for constructing DNA methylation 
profiles of lincRNAs 

To characterize DNA methylation patterns for lincRNAs, 
we designed a computational strategy to reannotate data of 
Infinium 450k arrays into four human hncRNA-associated 
regions (Figure lA). In total, 3361 HncRNAs had at least 
one probe sequence uniquely mapped to one of the four re- 
gions (Supplementary Table S2). Most probe sequences cor- 
responded to introns (69 11, 47%) or exons (3447, 23%). Al- 
though a substantial set of probe sequences mapped to the 
regions 10 kb upstream from the TSSs, we retained only the 
probes closest to each TSS to determine the DNA methy- 
lation status of lincRNA promoters (2461, 13%). The re- 
maining probe sequences (2001, 13%) were annotated clos- 
est to the regions 10 kb downstream from the TTSs (Figure 
IB). To determine the validity of the reannotation strategy, 
we annotated the probe sequences to PCG loci using the 
same strategy. The results were consistent with the previ- 
ous probe-PCG annotations provided by Illumina. To de- 
termine the reliability of lincRNA methylation status, we 
used data from Infinium 450k arrays and reduced represen- 
tation bisulfite sequencing (RRBS) of nine cell lines from 
the ENCODE project (22). Methylation levels of probes an- 
notated to lincRNAs were consistent with levels detected by 
RRBS (Supplementary Figure SI). In each cell line, all de- 
tected probe sites showed significant concordance between 
Infinium 450k array and RRBS for methylation, including 
probes annotated to the four lincRNA-associated regions 
(Supplementary Figure S2) and probes annotated only to 
lincRNA promoters (Supplementary Figure S3). 

We compared the DNA methylation patterns for each re- 
gion: promoters, exons, introns and the regions 10 kb down- 
stream from the TTSs for lincRNAs within each of the 
10 expression quantiles with the Infinium 450k array and 
RNA-seq data of an Hl-hESC cell Hne from the ENCODE 
project (22) (Figure IC). Compared with the other three 
regions, hypermethylation of promoters was more tightly 
linked to transcriptional silencing of lincRNAs (Supple- 
mentary Figure S4). Therefore, we constructed lincRNA 
promoter methylation profiles for 20 cancer types including 
4629 tumors and 705 corresponding normal tissue samples 
(TCGA; Supplementary Table S3). 

Dissecting lincRNAs promoter methylation patterns in can- 
cers 

For the lincRNAs in this study, the average DNA methy- 
lation levels showed significant differences between tumors 
and normal samples in 18 of the 20 cancer types (?-test, 
P < 0.05), with lower methylation levels in tumors than 
normal samples for 1 5 cancer types. Exceptions were kid- 
ney renal papillary cell cancer (KIRP), brain lower grade 
gHoma (LGG) and prostate adeno cancer (PRAD) (Figure 
2A). Since disrupting DNA methyltransferases may pro- 
mote chromosome instability and tumor progression, can- 
cer cells are usually less methylated at individual CpG dinu- 
cleotides than healthy cells (35-38). The lower average DNA 
methylation levels of lincRNAs in tumors than in corre- 
sponding normal samples for most cancer types were con- 
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Figure 1. Computational strategy for reannotating Intinium 450k array data to construct lincRNA methylation profiles. (A) Probe reannotation pipeline for 
lincRNAs. (B) Pie chart with distribution and the number of probes annotated by functional region for all collected lincRNAs. (C) Methylation patterns of 
hncRNAs by 10 expression quantiles (lowest 10"/o-100%). Box plots show methylation levels in promoters, exons, introns and the regions 10 kb downstream 
from the TTS. 



sistent with the global hypomethylation patterns of PCGs 
in cancer cells. In contrast, hypermethylation of lincRNAs 
might be involved in DNA repair, tumor cell invasion, cell 
cycle regulation and other events in which silencing might 
induce metastasis (38). Aberrant promoter methylation was 
frequently observed in cancer samples and might have con- 
tributed to tumor progression by silencing tumor suppres- 
sor genes or activating oncogenes. Therefore, we explored 
the methylation patterns at lincRNA promoters in each 
cancer type by dividing the 10-kb region upstream of the 
TSS into 10 equal-sized bins. We obtained three representa- 
tive cancer type-specific methylation patterns for 20 cancer 
types, and examples were shown in bladder urothelial can- 
cer (BLCA), head and neck squamous cell cancer (HNSC) 
and LGG (Supplementary Figure S5). We then assigned 
CGIs and CpG shores (±2-kb regions from CGI start or 
end sites) in the promoter regions and obtained two repre- 
sentative cancer type-specific methylation patterns accord- 
ing to the methylation levels of probes mapped to each re- 
gion, and examples were shown in BLCA and LGG (Sup- 
plementary Figure S6). 



We performed unsupervised hierarchical clustering on 
the average promoter methylation profiles of lincRNAs for 
20 types of cancer. The results suggested that part of the 
lincRNAs had RM status in cancers, some lincRNAs had 
CM status in cancers and the others had VM patterns. Can- 
cers with similar lincRNA methylation patterns were clus- 
tered together. Three pairs of cancers with adjacent tissue 
of origins showed similar lincRNA methylation patterns: 
glioblastoma multiforme (GEM) and LGG, colon adeno 
cancer (COAD) and rectum adeno cancer (READ) and kid- 
ney renal clear cell cancer (KIRC) and KIRP (Figure 2B). 

To determine lincRNA methylation patterns in cancers, 
we classified the lincRNAs into two categories and three 
subcategories according to their methylation profiles of tu- 
mors (Figure 2C). We obtained 1854 PM HncRNAs and 67 
RM lincRNAs. By subdividing the PM lincRNAs, we ob- 
tained 1693 (91.32%) CM lincRNAs and 52 (2.80%) VM 
lincRNAs (Figure 2D and Supplementary Table S4). Us- 
ing our methodology, CM lincRN As had the significantly 
highest median methylation levels and RM lincRNAs had 
the lowest levels in tumors (Figure 2E). As an important 
regulating factor of gene expression in cancers (39), DNA 
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Figure 2. Dissecting lincRNA promoter methylation patterns in cancers. (A) Bar plots with average methylation levels of lincRNA promoters in tumors and 
corresponding normal samples (i-test, *F < 0.05, **/■ < l.Oe— 3 and ***/>< l.Oe— 4). Error bars, mean ± SEM. (B) Unsupervised hierarchical clustering of 
average methylation profiles for 2461 lincRNAs in 20 cancer types. GBM-LGG, KIRC-KIRP and COAD-READ were three pairs of cancers with similar 
tissue of origin. PRAD, BRCA, STAD, LUAD and PAAD were cancers arising from adeno. CESC, HNSC and LUSC were cancers arising from squamous 
cells. SARC and SKCM were sarcomatoid carcinomas. (C) Strategy used to segregate lincRNAs into sets with distinct methylation patterns. (D) Pie chart 
with the number of lincRNAs in different categories. (E) Box plots show methylation level of lincRNAs in different categories. Differences between sets 
were tested using Wilcoxon rank sum tests. (F) Box plots of DNA methylation and expression levels of VM, CM and RM lincRNAs for three cancers. 
Methylation and the corresponding expression values were obtained from consistent sample sets. 



methylation might be involved in regulating lincRNA ex- 
pression in cancers. We examined both methylation levels 
and expression levels of CM, VM and RM lincRNAs us- 
ing the Infinium 450k array data and the RNA-seq V2 data 
for COAD, GBM and HNSC. In all three cancers, RM lin- 
cRNAs showed the overall lowest median methylation level 
and the highest median expression level (Figure 2F). Our 
results indicated that the three different lincRNA methy- 
lation patterns in cancers were related to lincRNA expres- 
sion. Many lincRNAs have been identified as having regula- 
tory functions in cancer-related pathways such as the MYC 
and p53 pathways (40). Therefore, we might be able to in- 
fluence lincRNA expression by altering DNA methylation, 
thus disrupting the functions of lincRNAs in cancers. Fur- 
ther analysis of the three different DNA methylation pat- 
terns might help identify novel drug targets or cancer diag- 
nostic biomarkers. 



RM lincRNAs had the most conserved promoter regions and 
the least tissue-specific expression in normal tissues 

Since REs are involved in reprogramming of DNA methy- 
lation (41,42), we investigated whether REs affected Hn- 
cRNA methylation patterns. We quantified the REs around 
the TSSs of lincRNAs using the RMSK data from UCSC 
Genome Browser (43). All three major RE classes (LINEs, 
SINEs and LTRs) were depleted from lincRNA core pro- 
moters (2 kb upstream from the TSSs) (Figure 3A). More- 
over, RM lincRNAs had significantly fewer REs than CM 
lincRNAs, possibly caused by activated DNA methylation 
of REs in lincRNA promoters. RE insertion close to a 
lincRNA promoter or RE hypermethylation might inter- 
rupt the transcription factors or other regulatory elements 
binding to lincRNA promoters, which could contribute to 
lincRNAs tissue-specific expression. We quantified the tis- 
sue specificity of lincRNA expression in 1 6 normal tissues 
(SRA, E-MTAB-513) and six cell lines (GEO, GSE23316) 
using an information theory method (Supplementary file) 
(44). CM lincRNAs had significantly higher tissue-specific 
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Figure 3. RM lincRNAs had conserved promoters. (A) RM lincRNAs were depleted of REs at promoters. Graptis stiow frequency of LINEs, SINEs and 
LTRs at I-kb intervals around TSSs of CM, VM and RM lincRNAs. Significance of differences in densities was determined by Fisher's exact tests for 
repeat counts ± 2 kb from the TSSs. (B) RM lincRNAs had the lowest tissue-specific expression in normal tissues. Shown are cumulative distributions of 
tissue-speciticity scores for CM, VM and RM lincRNAs. Differences between lincRNA sets were tested using Wilcoxon rank sum tests (***/> < 0.001). 
(C) RM lincRNAs had evolutionarily conserved promoters. Shown are the graphs of conservation level in 500-bp intervals around the TSSs of CM, VM 
and RM lincRNAs. Conservation was determined by measuring the rate of base pair substitutions between species. Significance of observed differences 
between two categories was assessed using the Wilcoxon rank sum test for scores ± 2 kb around the TSSs (***_p < l.Oe— 3). 



expression than RM lincRNAs, which was consistent with 
our hypothesis (Figure 3B). To quantify the evolutionary 
conservation of lincRNA promoters, we used phastCons 
scores of placental mammals (45). The core promoters of 
lincRNAs showed the most conserved profiles. RM Lin- 
cRNAs showed significantly greater conservation than CM 
hncRNAs at core promoters (Figure 3C). 

We performed functional enrichment analysis of genes 
co-expressed with the RM, VM and CM lincRNAs (Pear- 
son's correlation test, top 5% ofP < 0.05) for BRCA, LUSC 
and GBM (46,47). RM genes (PCGs co-expressed with RM 
lincRNAs) in three cancers shared the GO terms 'regulation 
of transcription' and 'transcription'. CM genes (PCGs co- 
expressed with CM lincRNAs) were enriched in GO terms 
'immune response', 'cell cycle' and 'chromatin modification' 
among others (Supplementary Figure S7). For VM genes 
(PCGs co-expressed with VM lincRNAs), there were no sig- 
nificant functional enrichment results. In addition, 49 RM 
hncRNAs had homologous sequences in mice and 21 had 
homologs in zebrafish (48). RM lincRNAs had conserved 
transcriptional regulation regions and conserved sequences 
in multiple species, which suggested an evolutionary de- 
mand for correct regulation and expression of RM lincR- 
NAs. 



LincRNAs had promoter methylation patterns associated 
with cancer status, subtype and prognosis 

Since aberrant promoter methylation silences tumor sup- 
pressor genes and activates oncogenes (49), we analyzed 
the different methylation patterns of lincRNA promoters 
between tumors and corresponding normal tissue samples. 
For example, 126 lincRNAs showed significantly aberrant 
methylation patterns in tumors compared to correspond- 
ing normal samples, including 24 hypermethylated and 28 
hypomethylated lincRNAs for BRCA, and 14 hypermethy- 
lated and 60 hypomethylated lincRNAs for LUSC (Figure 
4A and B and Supplementary Table S5). Most AM lincR- 
NAs belonged to the CM category, indicating that these lin- 
cRNAs are consistently methylated in other types of tumors 
(Figure 4C and D). The hypomethylated CM lincRNAs in 
BRCA or LUSC showed a more common methylation pat- 
tern in normal samples than in tumors. 

We compared the methylation patterns of lincRN As for 
different subtypes of BRCA (basal-like, HER2-enriched, 
luminal A, luminal B and normal-like) (20) and LUSC 
(basal, classical, primitive and secretory) (21). We identified 
the lincRNAs with subtype-specific methylation patterns 
in BRCA and LUSC (Figure 4E and F and Supplemen- 
tary Table S6). Since tumors in each cancer molecular sub- 
type had distinctive biological and clinical behaviors, lincR- 
NAs with subtype-specific methylation patterns might have 
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Figure 4. LincRNAs whose methylation patterns were associated with cancer status or subtypes. (A, B) Heat maps of bidirectional hierarchical clustering 
of lincRNAs with significantly different methylation levels between BRCA and normal breast (A) or LUSC and normal lung (B). (C, D) Venn diagrams 
showed that most of the AM lincRNAs in BRCA (C) or LUSC (D) belonged to the CM category. (E, F) Heat maps showing the methylation profiles of 
the top 5% lincRNAs with significantly different methylation levels (FDR < 0.05) in the basal-like subtype compared to the others for BRCA (E) and in 
the classical subtype compared to the others for LUSC (F). LincRNAs are ranked by ascending order of r-test FDR values. 



crucial functions in these subtypes. Several lincRNAs with 
subtype-specific methylation patterns have been function- 
ally implicated in physiological or pathological processes 
through experimental validation. For instance, HOTAIR, 
a lincRNA hypomethylated in basal-like subtype BRCA, 
was highly expressed in metastatic breast cancers. Its high 
level of expression in primary breast tumors might pre- 
dict subsequent metastasis and death (6). MEGS, which 
was hypomethylated in the luminal A subtype of BRCA, 
is an imprinted long non-coding RNA (50). MEG3 acted 
as a growth suppressor in tumor cells and activated p53 
(51). In addition, HOTTIP, which binds the WDR5 pro- 



tein and forms a complex with the histone methyltrans- 
ferase protein MLL to target the WDR5-MLL complex to 
the HOXA region for transcriptional activation of HOXA 
(52), also showed subtype-specific methylation patterns in 
both BRCA (basal-like) and LUSC (classical). 

We combined the lincRNA methylation profiles with clin- 
ical annotations and identified a subset of lincRNAs with 
methylation values showing a trend associated with OS in 
BRCA, LUSC and UCEC. We used a validation set as an 
independent data set to validate candidate reliability. For 
UCEC, we obtained 23 PA lincRNAs in the validation 
set from 30 lincRNAs in the discovery set (FDR < 0.01). 
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For BRCA, we validated five lincRNAs associated with OS 
from the top 10 lincRNAs in the discovery set (FDR < 
0.01). For LUSC, we obtained seven PA lincRNAs (Supple- 
mentary Tables S7-S9). For example, BRCA patients with 
lower methylation level of lincRNA XLOCM9284 had bet- 
ter prognosis (Figure 5A). LUSC patients with relatively 
lower methylation level of HncRNA XLOCM9367 showed 
poorer prognosis (Figure 5B). For UCEC, patients with the 
highest methylation level of HncRNA XLOCM7617 had a 
better prognosis than patients with lower methylation level 
(Figure 5C). GAS5, a lincRNA Hnked to apoptosis that is 
involved in progression of some types of cancers, was signifi- 
cantly correlated with prognosis for UCEC (53,54). The lin- 
cRNA MEGS was associated with OS in the LUSC discov- 
ery set but not in the validation set, which inhibits prolifera- 
tion of non-small cell lung cancer cells and induces apopto- 
sis by affecting p53 expression (55). We hope to further val- 
idate candidates from the discovery set in the future, using 
a suitable tumor set. Besides, BRCAl, the ovarian cancer 
marker PCG, showed no correlation between methylation 
level and OS in a previous study (14). There were 42 lin- 
cRNAs showed a correlation between the methylation level 
and OS in BRCA, LUSC, UCEC, KIRC and LGG, some of 
which showed a negative correlation between methylation 
and expression in corresponding tumors (Pearson's corre- 
lation test; Supplementary Table SIO), suggesting their po- 
tential as novel prognostic biomarkers. 

We then used drug-free survival analysis to evaluate the 
possibility of promoter methylation of lincRNAs as drug 
targets. The drug-free interval was defined as from the 
end of chemotherapeutic drug treatment to the date of 
progression or recurrence or last contact (censored) (56). 
XLOCM7617 was a lincRNA that showed positive cor- 
relation between its methylation and drug-free survival in 
UCEC (log-rank P = 0.013; Supplementary Figure S8). In 
addition, we defined a methylation survival (MS) score us- 
ing the 23 previously verified PA lincRNAs for UCEC. For 
each UCEC tumor, a point was given if the methylation level 
of a PA lincRNA was higher than the median methylation 
and was associated with longer OS or vice versa. The MS 
score of each tumor was assigned as the sum of the points. 
Patients were designated as sensitive for complete or partial 
response to platinum chemotherapy in the clinical data from 
TCGA (57) and as resistant for stable or progressive disease. 
Patients with higher MS scores were more sensitive to drugs 
(Supplementary Figure S9). Among patients whose tumors 
had MS scores higher than the median MS score, 87% were 
sensitive compared with 61% of patients with tumors with 
lower MS scores (P — 0.037, x ' test). These results indicated 
that MS scores generated using lincRNA methylation levels 
might be used to predict patient sensitivity to chemothera- 
peutic drugs. Therefore, lincRNAs, whose promoter methy- 
lation patterns were associated with cancer status, subtype 
and prognosis, should be further studied as potential and 
novel cancer biomarkers. 

Functional analyses of AM lincRNAs in cancers 

Using the AM lincRNAs between tumors and correspond- 
ing normal samples identified from 14 types of cancers 
with at least seven normal samples, we constructed an AM 



lincRNA-cancer network (AMCN; Supplementary Figure 
SlOA). The AMCN had two types of nodes: cancers and 
AM lincRNAs. Edges existed only between a lincRNA and 
cancer when the lincRNA was aberrantly methylated in 
the cancer. The AMCN illustrated that most lincRNAs 
were aberrantly methylated in a single cancer and a few 
lincRNAs were aberrantly methylated in multiple cancers 
(Supplementary Figure SlOB). A total of 196 HncRNAs 
were aberrantly methylated in more than one cancer out 
of all 434 AM HncRNAs in the AMCN. Thirty one AM 
lincRNAs showed pairwise appearing in more than three 
types of cancer (Supplementary Figure SI 1). The HncRNA 
XLOCJ)13592, located in chromosome 20, co-occurred with 
other AM lincRNAs in six types of cancer. Chromosome 
5 contained up to seven lincRNAs that co-occurred with 
other AM lincRNAs. By removing lincRNA nodes from 
the AMCN, we obtained a network of cancers (Figure 6A). 
Some pairs of cancers shared more AM lincRNAs than oth- 
ers. For instance, COAD and READ, two cancers that orig- 
inate in the intestine, shared 69 AM lincRNAs, with 20 lin- 
cRNAs aberrantly methylated uniquely in these two can- 
cers. Additionally, BLCA and UCEC shared 56 AM lincR- 
NAs, indicating that these two cancers might share a com- 
mon pathogenesis. Clinically, a high metastatic rate from 
UCEC to BLCA was seen (58). Furthermore, 191 AM lin- 
cRNAs showed consistent hypermethylated or hypomethy- 
lated status in diverse cancers. Five AM lincRNAs showed 
altered hypermethylation or hypomethylation status in five 
pairs of cancers (Figure 6B). 

Except for lung adeno cancer, 238 lincRNAs aberrantly 
methylated in only one cancer were named uniquely aber- 
rantly methylated (UAM) lincRNAs (Figure 6C). UCEC 
contained 60 UAM lincRNAs and liver hepatocellular can- 
cer contained 51, amounting to nearly 47% of the total 
UAM lincRNAs. Theoretically, a lincRNA could intrin- 
sically cw-regulate its neighbor PCGs by binding to its 
own locus. Thus, we used the PCGs neighbored to the tar- 
get lincRNAs to infer the putative functions of the lincR- 
NAs according to 'guilt by association' strategy (30,59). In 
UCEC, XLOCJ)13045 and XLOCJ)13050 were found to 
be adjacent to zinc -finger protein genes (ZNF181, ZNF30, 
ZNF404, ZNF45). The expression level of XLOC.013350 
was negatively correlated with its methylation level (Pear- 
son's correlation coefficient, PCC — —0.63, P < 0.05) and 
positively correlated with the expression of ZNF404 (PCC 
— 0.26, P < 0.05). We also observed a positive correla- 
tion between the expression of XLOC-013045 and ZNF181 
(PCC = 0.18, P < 0.05), indicating that these two HncRNAs 
may be involved in cell growth and apoptosis. In PRAD, lin- 
cRNA XLOC-002726 was significantly hypermethylated in 
tumors and showed a negative correlation between its ex- 
pression and methylation (PCC — —0.26, P < 0.05), which 
was a newly found susceptibility locus for prostate cancer 
in genome-wide association studies (1,60). CADM2, an up- 
stream gene near XLOC-002726 on the same strand, is a 
prostate cancer suppressor gene (61). Furthermore, with 
the 24 AM lincRNAs found for PRAD, we built five clas- 
sifiers based on Bayes network, naive Bayes, random for- 
est, logistic regression and radial basis function network 
models to identify patient tumors from normal samples. All 
five classifiers showed good performance by 10-fold cross- 
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Figure 5. LincRNAs with PA methylation patterns in BRCA, LUSC or UCEC. (A) Kaplan-Meier curves for discovery-set patients (n = 282) with higher 
(top 50%; « = 141) or lower (bottom 50%; n = 141) methylation of XLOCD09284 in BRCA (left). Kaplan-Meier curves for validation-set patients as 
above (right). (B) Kaplan-Meier curves for discovery-set patients {n = 96) with higher (top 50%; n = 48) or lower (bottom 50%; n = 48) methylation 
oi XLOC .009367 in LUSC (left). Kaplan-Meier curves for validation-set patients as above (right). (C) Kaplan-Meier curves for discovery-set patients 
(n = 171) with higher (top 50%; n = 86) or lower (bottom 50%; n = 85) methylation levels oi XLOCm7617 in UCEC (left). Kaplan-Meier curves for 
validation-set patients as above (right). The methylation differences between patients sets were tested using Wilcoxon rank sum tests (***/" < l.Oe— 3). 
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Figure 6. AM lincRNAs in cancers. (A) Network of cancers. The width and shades of color of the edges between two cancer types correlate with the 
numbers of shared AM lincRNAs. (B) AM lincRNAs with altered methylation status between cancers. (C) UAM LincRNAs. Dark gray, UAM lincRNA 
hypermethylation in tumors. Light gray, UAM lincRNA hypomethylated in tumors. 



validation (Supplementary Table Sll). Therefore, lincR- 
NAs with aberrant methylation patterns in cancers might 
be involved in cancer development and progression. Early 
detection of hypermethylation or hypomethylation of lin- 
cRNAs might serve as biomarkers for cancer diagnosis or 
treatment. 

DISCUSSION 

Epigenetic factors tightly control expression patterns of lin- 
cRNAs (62). For example, DNA methylation disrupted a 
long non-coding RNA activity by affecting expression in a 
lethal lung developmental disorder (63). To determine the 
DNA methylation patterns for lincRNAs in human cancers, 
we developed a strategy to reannotate Infinium 450k array 
probes to lincRNA loci and constructed lincRNA methyla- 
tion profiles for tumor patients. We investigated the patterns 
of lincRNA methylation in different cancer types and the 
functions of lincRN As in cancers. By clustering analyses of 
lincRNA methylation levels in cancer, we revealed that some 
types of cancer had similar lincRNA methylation patterns 
and classified lincRNAs according to their methylation pat- 
terns in tumors. By integrating cancer subtype data and pa- 
tients clinical information, we identified lincRNAs whose 
promoter methylation status was associated with cancer sta- 



tus, subtype and prognosis. By network analyses, we in- 
vestigated the functions of AM lincRNAs in cancers. By 
literature mining, we validated that a few AM lincRNAs 
were efficacious in cancer development and progression. 
Experimentally validating the potential tumor-promoting 
functions of these candidate lincRNAs in cancers would be 
meaningful. LincRNAs whose promoter methylation status 
was associated with cancer status, subtype and prognosis 
could be investigated as disease signatures. 

Two lincRNA catalogs were generated by Cabili et al: 
a predicted catalog and a stringent catalog (1). During the 
reannotating process, we only considered the stringent lin- 
cRNA catalog with nine additional known lincRNAs, for 
these lincRNA transcripts might be more rehable (1). How- 
ever, there were still some lincRNAs closed to the annotated 
PCGs. The probes used to evaluate PCGs might be mapped 
to nearby lincRNA-related regions. To avoid using these 
probes, we reannotated a subcatalog of 2167 lincRNAs that 
were more than 20 kb from the PCGs. Since we retained 
only probes annotated 10 kb upstream or downstream from 
lincRNAs or PCGs, probes annotated to the subcatalog of 
lincRNAs could not be related to PCGs. Methylation of the 
subcatalog of lincRNAs within each of the 10 expression 
quantiles showed that the lower the promoter methylation. 
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the higher the expression of the corresponding HncRNAs. 
Methylation of the other three regions was not related to 
the expression (Supplementary Figure S12). Unsupervised 
hierarchical clustering of the average promoter methylation 
profiles of these lincRNAs showed CM, VM or RM pat- 
terns in 20 types of cancer (Supplementary Figure SI 3). Al- 
though the average promoter methylation levels of PRAD 
were higher than the corresponding normal samples, the 
trends of the other sample sets were consistent with previ- 
ous analyses of the other 19 cancers (Supplementary Figure 
SI 4). Therefore, the methylation patterns of lincRNAs were 
maintained based on a small stringent set of lincRNAs. 

The relationship between exonic and intronic methyla- 
tion of lincRNAs in the Hl-hESC cell line and their expres- 
sion levels (Figure IC) were not what we expect from PCGs, 
in which highly expressed genes have been shown to be more 
methylated in their gene bodies than genes expressed at low 
levels (64,65). However, the relationship between expres- 
sion levels and gene -body methylation in PCGs has been 
shown to be complex in recent studies. For example, some 
tissue types showed a correlation between expression and 
gene-body methylation, whereas others showed no clear re- 
lationship (66,67). A relation between gene-body methyla- 
tion and evolution has been suggested. For example, genes 
expressed at moderate levels had the highest methylation 
levels in some plants and invertebrates (68,69). Further- 
more, the initiation and elongation of transcription showed 
different sensitivity to DNA methylation silencing in dif- 
ferent genomic and cellular contexts (70). Although lincR- 
NAs and PCGs both undergo the transcription process, lin- 
cRNAs have a much lower expression level and sequence 
conservation than PCGs (1), which could result in complex 
methylation patterns similar to those of PCGs. Therefore, 
the methylation patterns of lincRNA exons and introns in 
different species, cell types and phenotypes need to be fur- 
ther investigated. 

The observed lower tissue-specific expression patterns 
and the higher promoter conservation of RM lincRNAs 
are consistent with the high conservation score of ubiq- 
uitously expressed lincRNAs (26). However, this pattern 
differed from that of PCGs, where PM promoters were 
more conserved and more depleted of REs (28), indicating 
that REs may play a different role in reprogramming the 
DNA methylation of lincRNA promoters. A recent study 
found that repetitive and transposable elements occurred 
in more than two-thirds of mature long non-coding RNA 
transcripts, particularly at their TSSs, whereas they seldom 
occurred in protein-coding transcripts (71), suggesting that 
they may play a role in the regulation of long non-coding 
RNA transcription (72). Since REs tend to be aberrantly 
methylated in human cancers (73,74), they might play a spe- 
cific role in altering lincRNA promoter methylation levels 
and further affect lincRNA transcription in cancer. The de- 
pletion of REs observed at RM lincRNAs may reflect a need 
to preserve their stable methylation patterns in cancer. 

In summary, we studied the functions and mechanisms of 
DNA methylation of lincRNAs in human cancers by rean- 
notating publicly available data and integrating them with 
genomic analyses. The identified cancer-associated or clini- 
cally relevant lincRNAs could be further evaluated for use 
as cancer biomarkers and potential therapeutic targets. 
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